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Abstract — This work presents novel distributed data collec- 
tion systems and storage algorithms for collaborative learning 
wireless sensor networks (WSNs). In a large WSN, consider 
n collaborative sensor devices distributed randomly to acquire 
information and learn about a certain field. Such sensors have 
less power, small bandwidth, and short memory, and they might 
disappear from the network after certain time of operations. 
The goal of this work is to design efficient strategies to learn 
about the field by collecting sensed data from these n sensors 
with less computational overhead and efficient storage encoding 
operations. 

In this data collection system, we propose two distributed data 
storage algorithms (DSA's) to solve this problem with the means 
of network flooding and connectivity among sensor devices. In 
the first algorithm denoted, DSA-I, it's assumed that the total 
number of nodes is known for each node in the network. We show 
that this algorithm is efficient in terms of the encoding/decoding 
operations. Furthermore, every node uses network flooding to 
disseminate its data throughout the network using mixing time 
approximately 0(n). In the second algorithm denoted, DSA- 
II, it's assumed that the total number of nodes is not known 
for each learning sensor, hence dissemination of the data does 
not depend on the value of n. In this case we show that the 
encoding operations take 0(C/j, 2 ), where /i is the mean degree 
of the network graph and C is a system parameter. Performance 
of these two algorithms match the derived theoretical results. 
Finally, we show how to deploy these algorithms for monitoring 
and measuring certain phenomenons in American-made camp 
tents located in Minna field in south-east side of Makkah. 



I. Introduction 

The field of information technology has witnessed remark- 
able extensions especially after appearance of the world wide 
web two decades ago. In addition, this has been embarked by 
appearance of several communication networking branches, 
such as wireless sensor networks. Wireless sensor networks 
(WSN's) consist of small devices (nodes) with low CPU 
power, small bandwidth, and limited memory. They can be 
deployed in isolated, tragedy, and obscured fields to monitor 
objects, detect fires or floods, measure temperatures, transmit 
media streams, and etc. They can also be used in areas where 
human involvement is difficult to reach or it is danger for 
human being to be involved. There has been extensive research 
work on sensor networks to improve their services, powers, 
and operations 1121 . They have taken much attention recently 
due to their varieties of applications. Much research has been 
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Fig. 1, A wireless sensor network consists of various small devices with 
limited CPU power, small memory and bandwidth. Collaborative Sensor nodes 
are distributed randomly to monitor, collect data, and learn about Minna field 
in the east of Makkah. Approximately 50.000 camp tents are located in Minna 
to accommodate 3-5 million people for 4-8 days during pilgrimage, according 
to 2010 KSA statistics. 



done in both academia and industry to increase their reliability, 
usage, and operations. 

We consider a model for large-scale wireless sensor net- 
works where n data collection and storage sensor nodes 
are distributed uniformly and randomly. These n nodes are 
deployed to collect information and transmit media streams 
(images, videos, texts) about a certain field. These n sensor 
devices have a short time-to-live, limited memory, and might 
disappear from the network at anytime. Also, the nodes do 
not know locations of the neighboring nodes, and they do not 
maintain routing tables to forward messages. We assume that 
the n sensing and data collection nodes generate independent 
packets that can be classified as initial or update packets sent 
at an arbitrary time. A packet initiated from a node u contains 
its ID U , time-to-live parameter, and sensed data. In addition, 
ever storage node u has M buffer size that can be divided into 
m small buffers to save other neighbors' data. Every storage 
node decides randomly and independently from which it will 
accept or reject packets. Also, a packet will be discarded once 
it travels through the network 0(n). 

The goal of this work is to develop an efficient method to 
randomly distribute and collect information from n sensors 
to all n storage nodes. In this case, a data collector with a 
high computational power can query any (1 + e)n/m storage 
nodes for e > 0, and easily retrieve information about the 



Fig. 2. A wireless sensor devices equipped with several sensor components 
to measure temperature, gas, pollution, and co2. 



Fig. 3. A WSN with n nodes arbitrary and randomly distributed in a field. 
A node Sj determines its degree d(si) by sending a flooding message to the 
neighboring nodes. 



n sensor nodes with a high probability. Other versions of this 
problem has been solved by using coding in a centralized way 
(e.g. Fountain codes, MDS and linear codes) by adding some 
redundancy, where a node can send its data to a pre-selected 
set of other nodes in the network (TJ, Q, Q, ifTUl . Over a dis- 
tributed random network, this is unreliable since we still need 
to find a strategy to distribute the information from the sources 
to a set of arbitrary storage nodes. Hence, a decentralized way 
solution is needed where the data collector and storage nodes 
are distributed randomly and independently. Therefore, the 
considered problem is a network storage problem rather than 
a network transmission problem. The later problem assumes 
that channel coding and modulation theory are used to handle 
the transmission for a source to a destination. The former 
problem requires distributed networking storage algorithms 
to assure protection of information against node failures or 
disappearance. It is assumed that all nodes trust each other 
data, and attackers are unable to break the nodes transitions. 
The motivations for this work are that: 

i) We demonstrate a realistic model for WSN's, where nodes 
are distributed randomly with limited power and memory. 

ii) The encoding and decoding operations are done linearly. 

iii) Querying only (l+e)n/m a subset of the network reveals 
information about all nodes. 

iv) The proposed storage algorithms have less computational 
complexity in comparison to the related work shown in 
Section HX1 

This work is organized as follows. In Section [Tx] we 
present a background and short survey of the related work. 
In Section HI] we introduce the network model. In Sections ITTT1 
and [V] we propose two storage algorithms and provide their 
analysis in Sections [TV] and [VT] respectively. In Section IVIII 
we present simulation studies of the proposed algorithms, and 
the work is concluded in Section [X] 

II. Network Model and Assumptions 

In this section we present the network model and problem 
definition. Consider a wireless sensor network Af with n 
sensor nodes that are uniformly distributed at random in a 
region A = [L,L] 2 for some integer L > 1. The network 
model M can be considered as an abstract graph G = (V, E) 



with a set of nodes V and a set of edges E. The set V 
represents the sensors S = {si, S2, ■ ■ ■ , s n } that will measure 
information about a specific field. Also, E represents a set 
of connections (links) between the sensors S. Two arbitrary 
sensors Sj and Sj are connected if they are in each other 
transmission range. 

We ensure that the network is dense, meaning with high 
probability there are no isolated nodes. Let r > be a fraction, 
we say that two nodes u and v in V are connected in G if and 
only if the distance between them is bounded by the design 
parameter r, i.e. < d(u, v) < r. Put differently, let z be a 
random variable represents existence of an edge between any 
two arbitrary nodes u and v. Then 

z = { 1 d ( U,V "> ~ r (1) 
otherwise 

One can guarantee such condition by assuming that the 
radius r > O(^). 

A. Assumptions 

We have the following assumptions about the network 
model Af: 

i) Let S = {si, . . . , s n } be a set of sensing nodes that are 
distributed randomly and uniformly in a field. Also, they 
are the set of storage nodes. So, this assumption differ- 
entiate between our work and the problems considered 
in |0, d. 

ii) Every node does not maintain routing or geographic 
tables, and the network topology is not known. Every 
node Si can send a flooding message to the neighboring 
nodes. Also, every node s, can detect the total number of 
neighbors by sending a simple flooding query message, 
and whoever replies to this message will be a neighbor 
of this node. Therefore, our work is more general and 
different from the work done in J4]], J6). The degree d{u) 
of this node is the total number of neighbors with a direct 
connection. 

iii) Every node has a buffer of size M and this buffer can 
be divided into smaller buffers, each of size c, such that 
m = [M/c\ . Hence, all nodes have the same number of 
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Fig. 4. Every node Sj has a buffer of size M that is divided into m small 
buffers. The node Sj decides with a certain probability whether to accept or 
reject a data a; s . and where to save it in one of its buffers. 



buffers. Also, the first buffer of a node u is reserved for 
its own sensing data. 

iv) Every node s» prepares a packet packet Si with its ID, 
sensed data x Si , counter c(x Si ), and a flag that is set to 
zero or one. 

packet St = {ID Si , x Si , cfa), flag) (2) 

The flag is set to zero when the sensors initiate data for 
the first time, otherwise it will be set to one for data 
update. 

v) We will consider two different types of packets: initial- 
ization and update packets. One can consider these two 
cases by using a flag that takes the values zero and one. 
If the source node sends a packet and the flag is set to 
zero, then it will be considered as an initialization packet. 
Otherwise, it will be considered as an update packet. The 
packets sent from all sources at the beginning of sensing 
phase are considered initialization packets. 

vi) Every node draws a degree d u from a degree distribution 
VL. If a node decided to accept a packet, it will also decide 
on which buffer it will be stored. 

When a node Sj receivers a packet, it will decide to either 
reject or accept it with a certain probability. 



III. Distributed Storage Algorithms 

In this section we will present a networked distributed 
storage algorithm for wireless sensor networks and study its 
encoding and decoding operations. Other previous algorithms 
assumed that k source nodes disseminate their sensed data 
throughout a network with n storage nodes using the means of 
Fountain codes and random walks. However, in this work we 
generalize this scenario where a set of n sources disseminate 
their data to a set of n storage nodes. Also, in this proposed 
algorithm we use properties of wireless sensor networks such 
as broadcasting and flooding. 

A. Encoding Operations 

We present a distributed storage algorithm (DSA-I) for wire- 
less sensor networks. DSA-I algorithm consists of three main 
steps: Initialization, encoding/flooding, and storage phases. 
Each phase can be described as follows. 

I) 

1) Initialization Phase: Every node s, in S has an ID Si 

and reading (sensing) data x Si . The node s, in the 



initialization phase prepares a packet Si along with its 
info, a counter c(x Si ) that determines the maximum 
number of hops that will receive x Si , and a flag that 
is set to zero. We ensure that every message x s% will 
have it is own threshold value c(x Si ) set by the sender 
Si based on the set of neighbors AT(s,-). This value will 
depend on the degree d(Si). If the node s, has a few 
neighbors, then c(x Si ) will be large. Also, a node with 
large number of neighbors will choose a small counter 
c(x Si ). This means that every node will decide its own 
counter. 

packet Si = (ID Si ,x Si ,c(x Si ), flag) (3) 

The node s, broadcasts this packet to all neighboring 
nodes 7V(sj). 

2) Encoding and Flooding Phase: 

• After the flooding phase, every node u receiving the 
packet Si will accept the data x Si with probability 
one and will add this data to its buffer y. 

• The node u will decrease the counter by one as 

c(xi) = c(xi) - 1. (5) 

• The node u will select a set of neighbors that did 
not receiver the message x Si and it will send this 
message using multicasting. 

• For an arbitrary node v that receives the message 
from u, it will check if the x Si has been received 
before, if yes, then it will discard it. If not, then it 
will run a probability distributed whether to accept 
or reject it. If accepted, then it will add the data 
to its buffer y+ = y~ © x Si and will decrease the 
counter c(xj) = c(x,i) — 1. 

• The node v will check if the counter is zero, 
otherwise it will decrease it and send this message 
to the neighboring nodes that did not receive it using 
multicasting. 

3) Storage Phase: Every node will maintain its own buffer 
by storing a copy of its data and other nodes' data. Also, 
a node will store a list of nodes ID's of the packets that 
reached it. After all nodes receive, send and storage their 
own and neighboring data, every node will be able to 
maintain a buffer with some data of the network nodes. 



B. Decoding Operations 

The stored data can be recovered by querying a number of 
nodes from the network. Let n be the total number of alive 
nodes; assume that every node has m buffers such that [_-^/ c J 
is the number of buffers, where c is a small buffer size, and M 
is total buffer size by a node . Then the data collector needs 
to query at least (1 + e)n/m nodes in order to retrieve the 
information about the n variables. 
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Input: A sensor network with S = {si, . . . , s n } source 
nodes, n source packets x Si , . . . , x Sn and a 
positive constant c(si). 
Output: storage buffers y\ , y^ , . . . , y n for all sensors S. 
foreach node u = 1 : n do 

Generate d c (u) according to Oj s (d) (or Cl rs (d) and a 
set of neighbors Af(u) using flooding.; 

end 

foreach source node Sj, i = 1 : n do 

Generate header of x Si and token = 0; 

Set counter c(x Si ) = [n/d(si)\; 

Flood x Si to all Af(si) uniformly at random, Send 

x Si to u G Af(si) ; 

with probability 1, y„ = y u © x Si ; 

Put x Si into it's forward queue; 

c(x Si ) = c(x Si ) - 1; 

end 

while source packets remaining do 

foreach node u receives packets before current round 
do 

Choose v G A/" (it) uniformly at random; 
Send packet £ Sj in it's forward queue to i>; 
if i) receives x Si for the first time then 
coin = rand(l); 

flip a coin to accept or reject a packet ; 
if coin < -j-jyj then 

Dv = Vv © ^Si i 

Put x Si into u's forward queue; 

c{x Si ) = c(x Sl ) - 1 

end 

else if c(x Si ) > 1 then 

Put x Si into v's forward queue; 

c(x Si ) = c(x Si ) - 1; 

else 

Discard x Si \ 

Hence C(si) = 1 or no node to send to. 

end 

end 

end 

Algorithm 1: DSA-I Algorithm: Distributed storage algo- 
rithm for a WSN where the data is disseminated using 
multicasting and flooding to all neighbors. 



IV. DSA-I Analysis 

We shall provide analysis for the DSA-I algorithm shown in 
the previous section. The main idea is to utilize flooding and 
the node degree of each node to disseminate the sensed data 
from sensors throughout the network. We note that nodes with 
large degree will have smaller counters in their packets such 
that their packets will travel for minimal number of neighbors. 
Also, nodes with smaller degree will have larger counters such 
that their packets will be disseminated to many neighbors as 
possible. 

The following lemma establishes the number of hobs (steps) 
that every packet will travel in the network. 

Lemma 1: On average with a high probability, the total 
number of steps for one packet originated by a node u in 



one branch in DSA-I is given by 

0(n/fx). (6) 

Proof: Let u be a node originating a packet packet u and it 
has degree d(u). For any arbitrary node v, the packet packet u 
will be forwarded only if it is the first time to visit v or the 
counter c(x u ) > 2. We know that every packet originated from 
a node u has a counter given by 

c(x u )=[n/d(u)\. (7) 

Let be the mean degree of an abstract graph representing 
the network Af, see Definition [23] On average assuming every 
packet will be sent to [i neighboring nodes. Approximating 
the mean degree of the graph to the degree of any arbitrary 
node u, the result follows. 

■ 

The previous lemma ensures that if d(u) > n/2, then the 
node u will flood its packet only once c(u) = 1. In addition, 
nodes with smaller degrees will require to send their packets 
using large number of steps. 

If the total number of nodes is not known, one can use 
a random walk initiated by the node u to estimate the total 
number of nodes. In Section [V] we will propose different 
algorithm that does not depend on estimating n or use random 
walks in a graph. 

The following lemma shows the total number of transmis- 
sions required to disseminate the information throughout the 
network. 

Lemma 2: Let Af be an instance model of a wireless 
sensor network with n sensor nodes. The total number of 
transmissions required to disseminate the information from 
any arbitrary node throughout the network is given by 

0(n). (8) 

Proof: Let d{si) be the degree (number of neighbors 
with a direct connection) of a sensor node Sj. On average 
p is the mean degree of the set of sensors S approximated 
to ^(X)™ d(si)). Every node does flooding that takes 0(1) 
running time to d(si) neighbors. In order to disseminate in- 
formation from a sensor Si, at least n/u steps are needed using 
Lemma [T] Also, every sensor Sj needs to send /j, messages on 
average to the neighbors. Hence the result follows. 

■ 

The following theorem shows the encoding complexity of 
DSA-I algorithm. 

Theorem 3: The encoding operations of DSA-I algorithm 
are the total number of transmissions required to disseminate 
information sensed by all nodes that is given by 

0(n 2 ). (9) 

V. DSA-II Algorithm Without Knowing Global 
Information 

In algorithm DSA-I we assumed that the total number of 
nodes are known in advance for each sensing storing node in 
the network. This might not be the case since arbitrary nodes 
might join and leave the network at various time due to the 
fact that they have limited CPU and short life time. Therefore, 
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one needs to design network storage algorithm that does not 
depend on the value of total number of nodes. 

In this section we will develop a distributed storage al- 
gorithm (DSA-II) that is totally distributed without knowing 
global information. The objective is that each node u will 
estimate a value for its counter c(u); the number of steps in 
which each packet will be disseminated in the network. In 
DSA-II each node u will first perform an inference phase that 
will calculate value of the counter c(u). This can be achieved 
using the degree of u and the degrees of the neighboring nodes 
Af(u). We also assume a system parameter c u that will depend 
on the network condition and node's degree. 

Inference Phase: Let u be an arbitrary node in a distributed 
network Af. In the inference phase, each node u will dynami- 
cally determine value of the counter c{u). The node u knows 
its neighbors Af(u). This is achieved in the flooding phase. 
Furthermore, the node v in Af(u) knows the degrees of its 
neighbors. 

The inference phase is done dynamically in a sense that 
every node in the network will separately decide a value for 
its counter. Nodes with large degrees will have a high chance 
of forwarding their data throughout the network to a large 
number of nodes. 

Then encoding operations of DSA-II algorithm are similar 
to DSA-I algorithm except the former utilizes an inference 
phase, where the number of forwarding steps are predeter- 
mined first. Assume v be a node connected to a source node 
u. Let b v be the degree of a node v without adding nodes in 
Af(u) U u. We can define the counter c(u) as 

c(u) = Cu V K, (10) 

\-d(u) J 

Encoding and Flooding Phase: 

• After the inference and initialization phases, every node 
u receiving the packet Si will accept the data x Si with 
probability one and will add this data to its buffer y. 

vt = y~®x Si . (11) 

• The node u will decrease the counter by one as 

c(x Si ) = c(x Si ) - 1. (12) 

« The node u will select a set of neighbors that did not 
receiver the message x Si and it will send this message 
using multicasting. 

• For an arbitrary node v that receives the message from u, 
it will check if the x Si has been received before, if yes, 
then it will discard it. If not, then it will run a probability 
distributed whether to accept or reject it. If accepted, then 
it will add the data to its buffer y+ = y~ x Si and will 
decrease the counter c(xi) = c(x,i) — 1. 

• The node v will check if the counter is zero, otherwise it 
will decrease it and send this message to the neighboring 
nodes that did not receive it. 

Storage Phase: Every node will maintain its own buffer by 
storing a copy of its data and other nodes' data. Also, a node 



Input: A sensor network Af with S = {si, . . . , s,, . . .} 

source nodes, source packets x Si , . . . , x Si , 

Output: storage buffers yi,y%, ■ ■ . ,y%, . ■ ■ for all sensors 
S. 

foreach node u in Af do 

determine a set of neighbors Af(u) using flooding.; 
determine a system parameter c u ; 

end 

Inference Phase 

foreach source node u in Af do 

query the neighbors Af(u) of Si for their degrees.; 
Let v £ Af(u) and b v be the v degree without adding 
nodes in Af{u) U u; 
if d v = 1 then 

Repeat inference phase at v; 

Repeat until b v > yf 1 for some v' £ N(v); 

Put b v = J2 V ' d V ' 

end 

c(u) = c u [j^j Y.veN'(u) M > 

end 

foreach source node Si in Af do 

Generate header of x Si and token = 0; 

flood x Si to all Af(si) uniformly at random, send x Si 

to u £ N{si) ; 

with probability 1, y u = y u © x Si ; 
Put x Si into u's forward queue; 

c(x Si ) = c(x Sz ) - 1; 

end 

while source packets remaining do 

Run the encoding and flooding phase in DSA-I alg.; 

end 

Algorithm 2: DSA-II Algorithm: Distributed storage algo- 
rithm for a WSN without knowing global information where 
the data is disseminated using multicasting and flooding to 
all neighbors. 



will store a list of nodes ID's of the packets that reached 
it. After all nodes receive, send and storage their own and 
neighbors' data, every node will be able to maintain a buffer 
with some data of the network nodes. 



VI. DSA-II Analysis 

We also shall provide analysis for the DSA-II algorithm 
shown in the previous section. The main idea is to utilize 
flooding and the node degree to disseminate the sensed data 
from sensors throughout the network. We ensure that nodes 
with large degree will have smaller counters in their packets 
such that their packets will travel for minimal number of hops. 
Also, nodes with smaller degree will have larger counters such 
that their packets will travel to many neighbors as possible. 

The following lemma establishes the number of hobs (steps) 
that every packet will travel in the network. Let A be the 
average node density ifTTl . 

Lemma 4: On average for a uniformly distributed network, 
the total number of steps for one packet originated by a node 
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u in one branch in DSA-II is given by 

0( M -A). 



(13) 



Proof: Let u be a node originating a packet packet u and it 
has degree and when the nodes are uniformly distributed 
in the network we can approximate d(u) as /i. We know that 
every packet originated from a node u has a counter given by 



c{u) 



d{u) 



(14) 



We ensure that c u is inversely proportional to node degree 
so that nodes with small number of neighbors we take large 
values of c u and vice versa. Also in case that node v has only 
one neighbor other than the originating node u we traverse 
through this node until we get at least one node v' that has 
degree b' v > 1 . 

On average assuming every packet will be sent to // neigh- 
boring nodes. We can approximate b v as p, — A so we can 
rewrite the equation Ylve/ZM b v /d(u) as (p)(p — X)/p. For 
any arbitrary node v, the packet packet u will be forwarded 
only if it is the first time to visit v or the counter c(x u ) > 2. 

■ 

The following lemma shows the total number of transmis- 
sions required to disseminate the information throughout the 
network. 

Lemma 5: Let M be an instance model of a wireless 
sensor network with n sensor nodes uniformly distributed. 
The total number of transmissions required to disseminate the 
information from any arbitrary node throughout the network 
is given by 



o(Mm-a)). 



(15) 



Proof: Let d(si) be the degree (number of neighbors 
with a direct connection) of a sensor node Sj. On average 
/i is the mean degree of the set of sensors S approximated 
t0 (527=1 d( s i))- Every node does flooding that takes 0(1) 
running time to <f(sj) neighbors. In order to disseminate infor- 
mation from a sensor Sj, at least fi — A steps are needed using 
Lemma [4] Also, every sensor Sj needs to send p, messages on 
average to the neighbors. Hence the result follows. ■ 

The following theorem shows the encoding complexity of 
DSA-I algorithm. 

Theorem 6: The encoding operations of DSA-II algorithm 
are the total number of transmissions required to disseminate 
information sensed by all nodes and given by 



0(p(p - X)n). 



(16) 



VII. Performance and Simulation Results 

In this section we will simulate the distributed storage 
algorithms, DSA-I and DSA-II, presented in the previous 
sections. The main performance metric we investigate is the 
successful decoding probability versus the decoding ratio. 

Let p be the successful decoding probability defined as per- 
centage of M s successful trials for recovering all n variables 
(symbols) to the total number of trails. Also, let h be the total 




Fig. 5. A WSN with n nodes arbitrary and randomly distributed in a field. 
The successful decoding ratio is shown for various values of n=50, 100, 150 
with the DSA-I algorithm. 



number of queries needed to recover those n variables. We can 
define the decoding ratio as the total queried nodes divided by 
n, i.e. h/n. 

Definition 7: (Decoding Ratio) Decoding ratio r\ is the 
ratio between the number of querying nodes h and the number 
of sources n, i.e., 

7]=-. (17) 

n 

Definition 8: (Successful Decoding Probability) Successful 
decoding probability P s is the probability that the n source 
packets are all recovered from the h querying nodes. 

In our simulation, P s is evaluated as follows. Suppose the 
network has n nodes , and we query h nodes. There are (?) 
ways to choose such h nodes, we pick a set S of these choices 
uniformly at random, set S was chosen large enough to give 
more normal results, So given the set S which is a ratio < 
r < 1 of all possible combinations we define M. as fellow: 



M 



h\(n-h)\ 



(18) 



Let M s be the size of the subset these M choices of h query 
nodes from which the all n source packets can be recovered. 
Then, we evaluate the successful decoding probability as 



P, 



M 



(19) 



We ran the experiment over a network with area A = [L,L] 2 
grid and with different node densities. We evaluated the 
performance with various decoding ratios depending on the 
total number of nodes inside the network with incremental 
step = 0.1. 

For a decoding ratio r\ we select h nodes for our test. So 
we may have a large number of combinations to choose from, 
which may get order of 100 100 combinations, So we have to 
choose a fair portion r of these combinations TV <C r <?C M 
and average the results over these experiments. 
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Decoding Ratio 

Fig. 6. A WSN with n nodes arbitrary and randomly distributed in a field. 
The successful decoding ratio is shown for various values of n=30, 40, 50 
with the DSA-II algorithm. 



Fig. [5] shows the decoding performance of DSA-I algorithm 
with Ideal Soliton distribution with small number of nodes. We 
ran the experiment over a network with area A = [2, 2] 2 grid 
and with a node density 2.5 < A < 12. 5. We evaluated the 
performance with various decoding ratio 0.1 < rj < 1 with 
incremental step = 0.1. 

From these results we can see that the successful decoding 
probability increases as the node density increases while 
the decoding ratio 77 is kept constant. We can deduce that 
the successful decoding probability is above %70 when the 

decoding ratio is about %20 %30. Another observation 

is that with a node density A > 8, the successful decoding 
probability P s > %90. 

Fig. [7] shows the decoding performance of DSA-I algorithm 
with Ideal Soliton distribution with medium number of nodes. 
The network is deployed in A = [5, 5] 2 with node density A 
ranges from 4 to 20. From the simulation results we can see 
that the decoding ratio increases with the increase of A and 
approaches to 1 for 77 > %20 and A > 12. 

Fig lDshows the decoding performance of DSA-II algorithm 
with Ideal Soliton distribution with small number of nodes. We 
ran the first experiment over a network with area A = [2, 2] 2 
grid and with a node density 2.5 < A < 12.5, and evaluated 
the performance with various decoding ratio 0.1 < r\ < 1 with 
incremental step = 0.1, As shown in the figure the DSA-II 
algorithm archived similar results to the DSA-I algorithm with 
a successful decoding probability P s > %70 for a decoding 
ratio r\ > 0.4. 

Fig. [S] shows the a caparison between the buffer size in 
DSA-I and DSA-II in a network deployed in an area A = 
[5,5] 2 , it can be concluded from the results that the buffer 
size approximately equals %10 of the network size n. From 
Fig. [8] it can be seen that the buffer size is strongly related to 
the network density A. 
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Fig. 7. A WSN with n nodes arbitrary and randomly distributed in a field. 
The successful decoding ratio is shown for various values of n= 200, 400, 
600 with the DSA-I algorithm. 



VIII. Evaluation and Practical Aspects 

In this section we shall provide evaluation and comparison 
analysis between DSA-I and DSA-II algorithms and related 
work in distributed storage algorithms. Previous work focused 
on utilizing random walks and Fountain codes to disseminate 
data sensed by a set of sensors throughout the network. 
Also, global and geographical information such as knowing 
total number of nodes, routing tables, and node locations are 
used. In this work we do not assume knowing such global 
information. 

The main goal of this work is to design data collection 
algorithms that can be utilized in large-scale wireless sensor 
networks. We achieve this goal by disseminate data throughout 
the network using data flooding once at every sensor node, 
then adding some redundancy at other neighboring nodes 
using random walks and packet trapping. Every storage node 
will keep track of other node's ID's, from which it will 
accept/reject packets. 

The main advantages of the proposed algorithms are as 
follows 

i) One does not need to query all nodes in the network 
in order to retrieve information about all n nodes. Only 
%20 — %30 of the total nodes can be queried. 

ii) One can query only one arbitrary node u in a certain 
region in the network to obtain an information about this 
region. 

A. Sensing New Data 

The proposed algorithms work also in the case of data 
update. Assume a node u sensed data x u and it has been 
disseminated throughout the network using flooding as shown 
in DSA-I and DSA-II algorithms. In this case the flag value 
is set to zero; and a packet from the node u is originated as 
follows: 

packet u = {ID U , x u , c(x u ), flag) (20) 

We notice that every node v stores a copy from this data x u 
will also maintain a list of ID's including ID U . 
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Fig. 8. A Caparison between DSA-I and DSA-II buffer size for various 
node densities in a medium size network. Increasing number of sensor nodes 
increases linearly the number of buffers. 



Assume x u be the new sensed data from the node u. Let us 
consider the case that the node u wants to update its values, 
then the node u will send update message setting the flag to 
one. 

packet u = (ID U , x u ®x u , c(x u ), flag). (21) 

The new and old data are Xored in this packet. Every storage 
node will check the flag, whether it is an update or initial 
packet. Also, the node v will check if ID U is in its own list. 
Once a node v accepts the coming update packet, it will update 
its target buffer as 

y£ = Vv ® x u ® x u- (22) 



B. Practical Aspects 

The proposed algorithms can be deployed in large-scale 
wireless sensor networks, where geographic locations of sen- 
sor nodes are not known. Also, each sensor does not need 
to maintain routing tables about the neighboring nodes. Such 
applications include WSN's disseminated in forests and burned 
fields, where monitoring and detecting fires, floods and disas- 
ters phenomena are required. It also can be deployed in crowd 
large fields, where a large number of nodes is scattered to 
collection data. 

The proposed data collection and storage algorithms cer- 
tainly are can be deployed in Minna and Arafat fields in the 
east south of Makkah during pilgrimage. Fig. [9] shows camp 
tents located in Minna field in east of Makkah. The tents are 
supported by air-condition, electricity, and gas suppliers. The 
sensor devices are distributed randomly to measure gas pol- 
lution, detect fires, collect data, learn about the environment. 
The data storage devices receive collected data by the sensors 
and send it to the main server for further analysis. More details 
and practical aspects of this model will be explained in our 
future work. 



IX. Related Work 

Wireless vision sensor networks are small devices that can 
be scattered in a field or deployed in a network to measure 
certain phenomena. In this section we present the previous 
work in network storage codes that is relevant to our work. 
Distributed network storage codes such as Fountain codes are 
used along with random walks to distribute data from a set 
of sources k to a set of storage nodes n ^> k, see |2), 0. 
However, in this work we generalize this scenario where a set 
of n sources disseminate their data to a set of n storage nodes 

The most notable work in distributed storage algorithms for 
wireless sensor networks can be stated as. 

• Dimakis el al. in 0, |5), used a decentralized imple- 
mentation of Fountain codes that uses geographic routing 
and every node has to know its location. The motivation 
for using Fountain codes instead of using random linear 
codes is that Fountain codes need O(fclogfc) decoding 
complexity but random linear codes and RS codes use 
0(k 3 ) decoding complexity where k is the number of 
data blocks to be encoded. Also, one does not know 
in advance the degree d of the collector nodes f9|. The 
authors propose a randomized algorithm that constructs 
Fountain codes over grid network using only geographical 
knowledge of nodes and local randomized decisions. 
They also used fast random walks to disseminate source 
data to the storage nodes. 

• Lin el al. in |9), iflOl studied the question "how to retrieve 
historical data that the sensors have gathered even if some 
sensors are destroyed or disappeared from the network?" 
They analyzed techniques to increase "persistence" of 
sensed data in a random wireless sensor network. They 
proposed two decentralized algorithms using Fountain 
codes to guarantee the persistence and reliability of 
cached data on unreliable sensors. They used random 
walks to disseminate data from a sensor (source) node to 
a set of other storage nodes. The first algorithm introduces 
lower overhead than naive random-walk, while the second 
algorithm has lower level of fault tolerance than the 
original centralized Fountain code, but consumes much 
lower dissemination cost. They proposed the first novel 
decentralized implementation of Fountain codes in sensor 
networks in an efficient and scalable fashion. The authors 
did not use routing tables to dissimilate data from one 
sensor to a set of sensors. The reason is that a sensor does 
not have enough energy or memory to maintain a routing 
table which is scalable with the size of the network. 

« Kamara el al. in |8| proposed a novel technique called 
growth codes to increase data persistence in wireless 
sensor networks, i.e. increasing the amount of information 
that can be recover at the sink. Growth codes is a 
linear technique that information is encoded in an online 
distributed way with increasing degree. They defined 
persistence of a sensor network as "the fraction of data 
generated within the network that eventually reaches the 
sink" J8j- They showed that growth codes can increase 
the amount of information that can be recovered at any 
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Fig. 9. Wireless sensor devices are scattered in Minna field in East of Makkah to gather and collect data about the environment. Such sensors are able to 
detect fires, gas pollution, and other disasters phenomena. They are needed to monitor the large number of camp tents in Minna field. 



storage node at any time period whenever there is a failure 
in some other nodes. They do not use robust or Soliton 
distributions, however, they propose a new distribution 
depending on the network condition to determine degrees 
of the storage nodes. The motivation for their work is that 
1 ) Positions of the nodes are not known, so a sensor node 
does not need to know positions of other nodes. 2) They 
assume a round time of update the nodes, meaning with 
increasing the time t, degree of a symbol is increased. 
This is the idea behind growth degrees. 3) They provide 
practical implementations of growth codes and compare 
its performance with other codes. 4) The decoding part is 
done by querying an arbitrary sink, if the original sensed 
data has been collected correctly then finish, otherwise 
query another sink node. 

• The authors in el al. in ffl. B studied a model for 
distributed network storage algorithms for wireless sensor 
networks where k sensor nodes (sources) want to dissem- 
inate their data to n storage nodes with less computational 
complexity. The authors used Fountain codes and random 
walks in graphs to solve this problem. They also assumed 
that the total number or sources and storage nodes are not 
known. In other words, they gave an algorithm where 
every node in a network can estimate the number of 
sources and the total number of nodes. 
In this work we propose a different system for a wireless 
sensor network where all nodes act as sources as well as 
storage/receiver nodes. The encoding operations of a node to 
disseminate its data are linear and take less computational time 
in comparison to the previous work. 

X. Conclusion 

In this work we presented two distributed storage algorithms 
for large-scale wireless sensor networks. Given n storage nodes 
with limited buffers we demonstrated schemes to disseminate 
sensed data throughout the network with less computational 



overhead. The results and performance show that it is required 
to query only %20 — %30 of the network nodes in order 
to retrieve the data collected by the n sensing nodes, when 
the buffer size is %10 of the network size. Our future work 
will include practical and implementation aspects of these 
algorithms to better serve American-made camp tents in Minna 
and Arafat fields located in the east south of Makkah, KSA. 
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Appendix 



Given a network Af, the mean degree of a node in G can 
be defined as: 

Definition 9: (Node Degree) Consider a graph G = (V, E), 
where V and E denote the set of nodes and links, respectively. 
Given u, v € V, we say u and v are adjacent (or u is adjacent 
to v, and vice versa) if there exists a link between u and v, 
i.e., (u, v) <G E. In this case, we also say that u and v are 
neighbors. Denote by Af(u) the set of neighbors of a node u. 
The number of neighbors, with a direct connection, of a node 
u is called the node degree of u, and denoted by d(u), i.e., 
|A/"(u)| = d(u). The mean degree of a graph G is given by 

^ = W\^ d{u)l (23) 

where |V| is the total number of nodes in G. 

The Ideal Soliton distribution fl; s (<i) for k source blocks is 
given by 

1 

V 

1 



i = 1 



n ia (i) = Pr(d = i) = 



(24) 



2, 3, k. 



i{i-iy 

Let R = Co hi(k / 6)y/k, where cq is a suitable constant and 
< S < 1. The Robust Soliton distribution for k source blocks 
is defined as follows. Define 



r(i) = < 
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Rln(R/S) 
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— - 1 



(25) 



and let 



p = J2r(i)+n is (i). 



(26) 



The Robust Soliton distribution is given by 

T(i)+n is (i) 







for all i = 1, 2, k 



(27) 
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